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Abstract 

Planning systems which make use of domain 
theories can produce more accurate plans and 
achieve more goals as the quality of their do- 
main knowledge improves. MTR, a multi-strategy 
learning system, was designed to learn from sys- 
tem failures and improve domain knowledge used 
in planning. However, augmented domain knowl- 
edge can decrease planning efficiency. We de- 
scribe how improved knowledge that becomes ex- 
pensive to use can be approximated to yield cal- 
culated tradeoffs in accuracy and efficiency. 

1 INTRODUCTION 

Successful planning and control systems in realistic do- 
mains depend on the ability to improve with experi- 
ence. One characteristic of such systems is the ability 
to recover gracefully from failures, and avoid similar 
failures in the future. The long term objective of our 
machine learning research (Kedar et ai., 1991) is to im- 
prove planning and control systems by autonomously 
and systematically detecting failures, and refining do- 
main-knowledge to correct them. 

Adding knowledge to a system via machine learning 
methods is not without consequent cost to the sys- 
tem making use of this knowledge. Recent research 
in machine learning has begun to address this cost in 
addition to considering system performance improve- 
ment which results from the added knowledge. The no- 
tion of a utility problem was first presented in (Minton, 
1988), to refer to the degradation of system perfor- 
mance by machine learning (specifically Explanation- 
Based Learning). Holder (1988) generalized this idea 
to other learning paradigms and performance metrics. 

Most approaches to utility analysis focus on a single 
performance system, a single learning paradigm, and 
a single measure of utility (e.g. efficiency in Minton, 
1988; Tambe 1990; or accuracy in Holder, 1991). The 
utility of learned knowledge in more complex inte- 
grated systems needs to be measured along several di- 


mensions at once. In this paper, we present a case 
study of a multi-strategy machine learning system, 
mutual theory refinement , which refines knowledge for 
an integrated reactive system, the Entropy Reduction 
Engine (Drummond, et a/., 1991). We describe a method 
for trading off two conflicting utility metrics, system 
accuracy and system efficiency, in order to achieve par- 
ticular globed performance objectives. 

2 LEARNING IMPROVES PLAN 
ACCURACY 

Our case study is cast within the Entropy Reduction 
Engine (ere), a system which integrates planning and 
scheduling with reaction. ERE uses operators to model 
actions, and domain constraints to model physical laws 
(e.g., “the agent cannot be in two locations at once”). 
The operators and constraints are only approximate 
models, and therefore may not always correctly pre- 
dict the results of actions. Prediction failures drive 
the learning system, mutual theory refinement (mtr) 
(Kedar et. al. y 1991) , to refine tJ^ese two world models. 

MTR distinguishes itself from other analytic theory re- 
finement methods (e.g. Hammond^ 1986; Chien, 1989) in 
the ability to use an approximate model, rather than 
a fully correct and complete one, to refine other ap- 
proximate models. MTR is also unique in its ability 
to switch from analytic to inductive refinement when 
the approximate models are insufficient. While reduc- 
ing prediction failures, the ultimate aim of MTR is to 
improve the overall performance of the associated sys- 
tem (e.g. ERE). We have demonstrated experimen- 
tally that MTR increases the accuracy of the associated 
ERE system, but does so while degrading its efficiency 
(Kedar k McKusick, 1992) . That is, overall perfor- 
mance involves an accuracy /efficiency tradeoff. 

3 APPROXIMATION IMPROVES 
PLANNING EFFICIENCY 

Learning in an integrated system needs to promote 
some global performance objectives, e.g. a certain level 




4 CONCLUDING REMARKS 
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Figure 1: Tradeoffs in Efficiency and Accuracy While Ap- 
proximating Operator Preconditions. 


The goal of approximating refined knowledge is to 
achieve improvement in one utility dimension with- 
out unacceptably degrading another. In different sit- 
uations different approximations of the same knowl- 
edge may be appropriate to satisfy particular perfor- 
mance objectives. We are currently implementing an 
ERE/MTR performance system monitor that will en- 
able the performance system to dynamically approxi- 
mate the knowledge, sensitive to various performance 
measures and performance system components. Such 
an approach could lead to a more flexible system which 
achieves goals efficiently without having to limit or de- 
structively modify its store of learned knowledge. 


of system goal achievement given an efficiency con- 
straint. Unfortunately, an augmented domain theory 
may be too inefficient to use given such a constraint. 
Our objective here is to show that by approximating 
the refined theory in an informed manner, we can im- 
prove system efficiency while maintaining an accept- 
able level of accuracy. Through experimentation, we 
can anticipate how effective a particular approxima- 
tion is likely to be with respect to the global accuracy 
and efficiency objectives. 

We illustrate this process using data from our case 
study. We use two methods of approximating our the- 
ory: first, to improve efficiency in operator match cost 
once missing preconditions have been learned, the sys- 
tem approximates certain preconditions by truifying 
or nullifying them (as in Keller, 1987). Second, to im- 
prove efficiency in planning search once multiple out- 
comes have been learned, the system approximates the 
operator model by pruning some of the outcomes. 

Figure 1 shows accuracy and efficiency results, aver- 
aged for a set of 100 test problems, for all the approxi- 
mate theories generated using the first approximation 
method. The horizontal axis plots efficiency, as mea- 
sured in match cost. The vertical axis plots accuracy in 
terms of percent goal achievement. Each point on the 
scatter plot represents the average tradeoff yielded by 
a particular approximated theory. Boundary points, 
also known as pareto- optimal points (Ellman, 1988), are 
circled. Each point represents a version of the refined 
knowledge that cannot be improved in one dimension 
without degradation in the other dimension. A system 
can attain global objectives if a pareto-optimal point 
exists which meets or exceeds these objectives. 

For example, consider global objectives where desired 
accuracy on a set of problems is at least 60% goal 
achievement, with match cost below 700 function calls. 
We find the pareto-optimal point which best satisfies 
the global objectives at 67% goal achievement. By ex- 
plicitly measuring and plotting the tradeoffs for par- 
ticular approximations, the system is able to identify 
one yielding a tradeoff that is likely to achieve the per- 
formance objectives on new tasks. 
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